Goto

Collaborating Authors

 different species


BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Gu, Jianyang, Stevens, Samuel, Campolongo, Elizabeth G, Thompson, Matthew J, Zhang, Net, Wu, Jiaman, Kopanev, Andrei, Mai, Zheda, White, Alexander E., Balhoff, James, Dahdul, Wasila, Rubenstein, Daniel, Lapp, Hilmar, Berger-Wolf, Tanya, Chao, Wei-Lun, Su, Yu

arXiv.org Artificial Intelligence

Foundation models trained at scale exhibit remarkable emergent behaviors, learning new capabilities beyond their initial training objectives. We find such emergent behaviors in biological vision models via large-scale contrastive vision-language training. To achieve this, we first curate TreeOfLife-200M, comprising 214 million images of living organisms, the largest and most diverse biological organism image dataset to date. We then train BioCLIP 2 on TreeOfLife-200M to distinguish different species. Despite the narrow training objective, BioCLIP 2 yields extraordinary accuracy when applied to various biological visual tasks such as habitat classification and trait prediction. We identify emergent properties in the learned embedding space of BioCLIP 2. At the inter-species level, the embedding distribution of different species aligns closely with functional and ecological meanings (e.g., beak sizes and habitats). At the intra-species level, instead of being diminished, the intra-species variations (e.g., life stages and sexes) are preserved and better separated in subspaces orthogonal to inter-species distinctions. We provide formal proof and analyses to explain why hierarchical supervision and contrastive objectives encourage these emergent properties. Crucially, our results reveal that these properties become increasingly significant with larger-scale training data, leading to a biologically meaningful embedding space.


Flies disguised as wasps can't fool birds

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Despite their bee-like appearance, hoverflies are all buzz, no bite. The harmless insects, more closely related to midges than wasps, imitate their distant stinging cousins with stripes, high contrast colors, and narrow waists. In theory, the "flies in wasps' clothing" use this strategy to ward off would-be predators, without having to pay the cost of evolving venom and an appendage to inject it. The quality of hoverfly mimicry can vary– from detailed disguises to the insect equivalent of slapping on a pair of cat ears for a Halloween party.


Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting

Meincke, Lennart, Mollick, Ethan, Mollick, Lilach, Shapiro, Dan

arXiv.org Artificial Intelligence

This is the second in a series of short reports that seek to help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. In this report, we investigate Chain-of-Thought (CoT) prompting, a technique that encourages a large language model (LLM) to "think step by step" (Wei et al., 2022). CoT is a widely adopted method for improving reasoning tasks, however, our findings reveal a more nuanced picture of its effectiveness. We demonstrate two things: - The effectiveness of Chain-of-Thought prompting can vary greatly depending on the type of task and model. For non-reasoning models, CoT generally improves average performance by a small amount, particularly if the model does not inherently engage in step-by-step processing by default. However, CoT can introduce more variability in answers, sometimes triggering occasional errors in questions the model would otherwise get right. We also found that many recent models perform some form of CoT reasoning even if not asked; for these models, a request to perform CoT had little impact. Performing CoT generally requires far more tokens (increasing cost and time) than direct answers. - For models designed with explicit reasoning capabilities, CoT prompting often results in only marginal, if any, gains in answer accuracy. However, it significantly increases the time and tokens needed to generate a response.


Why birds love a good chat during migration - and how they 'buddy up' with a pal for the long journey

Daily Mail - Science & tech

On a long-haul flight, there's nothing worse than being sat next to a chatty stranger. But songbirds don't seem to mind, as a new study suggests they are likely to'talk' to other species as they migrate. Last year, a team of scientists discovered that birds seem to'buddy up' with other species at stopover sites during migration, but there was no evidence that different species pair up or communicate vocally on the wing. But now it's been found that the birds may even chat to gather important information about the journey they are on. For their new study the researchers, from the University of Illinois, analysed more than 18,000 hours of recorded flight calls made over three years in eastern North America.


CKSP: Cross-species Knowledge Sharing and Preserving for Universal Animal Activity Recognition

Mao, Axiu, Zhu, Meilu, Guo, Zhaojin, He, Zheng, Norton, Tomas, Liu, Kai

arXiv.org Artificial Intelligence

Deep learning techniques are dominating automated animal activity recognition (AAR) tasks with wearable sensors due to their high performance on large-scale labelled data. However, current deep learning-based AAR models are trained solely on datasets of individual animal species, constraining their applicability in practice and performing poorly when training data are limited. In this study, we propose a one-for-many framework, dubbed Cross-species Knowledge Sharing and Preserving (CKSP), based on sensor data of diverse animal species. Given the coexistence of generic and species-specific behavioural patterns among different species, we design a Shared-Preserved Convolution (SPConv) module. This module assigns an individual low-rank convolutional layer to each species for extracting species-specific features and employs a shared full-rank convolutional layer to learn generic features, enabling the CKSP framework to learn inter-species complementarity and alleviating data limitations via increasing data diversity. Considering the training conflict arising from discrepancies in data distributions among species, we devise a Species-specific Batch Normalization (SBN) module, that involves multiple BN layers to separately fit the distributions of different species. To validate CKSP's effectiveness, experiments are performed on three public datasets from horses, sheep, and cattle, respectively. The results show that our approach remarkably boosts the classification performance compared to the baseline method (one-for-one framework) solely trained on individual-species data, with increments of 6.04%, 2.06%, and 3.66% in accuracy, and 10.33%, 3.67%, and 7.90% in F1-score for the horse, sheep, and cattle datasets, respectively. This proves the promising capabilities of our method in leveraging multi-species data to augment classification performance.


Label Confidence Weighted Learning for Target-level Sentence Simplification

Qiu, Xinying, Zhang, Jingshen

arXiv.org Artificial Intelligence

Multi-level sentence simplification generates simplified sentences with varying language proficiency levels. We propose Label Confidence Weighted Learning (LCWL), a novel approach that incorporates a label confidence weighting scheme in the training loss of the encoder-decoder model, setting it apart from existing confidence-weighting methods primarily designed for classification. Experimentation on English grade-level simplification dataset shows that LCWL outperforms state-of-the-art unsupervised baselines. Fine-tuning the LCWL model on in-domain data and combining with Symmetric Cross Entropy (SCE) consistently delivers better simplifications compared to strong supervised methods. Our results highlight the effectiveness of label confidence weighting techniques for text simplification tasks with encoder-decoder architectures.


Generating Binary Species Range Maps

Dorm, Filip, Lange, Christian, Loarie, Scott, Mac Aodha, Oisin

arXiv.org Artificial Intelligence

Accurately predicting the geographic ranges of species is crucial for assisting conservation efforts. Traditionally, range maps were manually created by experts. However, species distribution models (SDMs) and, more recently, deep learning-based variants offer a potential automated alternative. Deep learning-based SDMs generate a continuous probability representing the predicted presence of a species at a given location, which must be binarized by setting per-species thresholds to obtain binary range maps. However, selecting appropriate per-species thresholds to binarize these predictions is non-trivial as different species can require distinct thresholds. In this work, we evaluate different approaches for automatically identifying the best thresholds for binarizing range maps using presence-only data. This includes approaches that require the generation of additional pseudo-absence data, along with ones that only require presence data. We also propose an extension of an existing presence-only technique that is more robust to outliers. We perform a detailed evaluation of different thresholding techniques on the tasks of binary range estimation and large-scale fine-grained visual classification, and we demonstrate improved performance over existing pseudo-absence free approaches using our method.


DNABERT-S: Learning Species-Aware DNA Embedding with Genome Foundation Models

Zhou, Zhihan, Wu, Weimin, Ho, Harrison, Wang, Jiayi, Shi, Lizhen, Davuluri, Ramana V, Wang, Zhong, Liu, Han

arXiv.org Artificial Intelligence

Effective DNA embedding remains crucial in genomic analysis, particularly in scenarios lacking labeled data for model fine-tuning, despite the significant advancements in genome foundation models. A prime example is metagenomics binning, a critical process in microbiome research that aims to group DNA sequences by their species from a complex mixture of DNA sequences derived from potentially thousands of distinct, often uncharacterized species. To fill the lack of effective DNA embedding models, we introduce DNABERT-S, a genome foundation model that specializes in creating species-aware DNA embeddings. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C$^2$LR) strategy. Empirical results on 18 diverse datasets showed DNABERT-S's remarkable performance. It outperforms the top baseline's performance in 10-shot species classification with just a 2-shot training while doubling the Adjusted Rand Index (ARI) in species clustering and substantially increasing the number of correctly identified species in metagenomics binning. The code, data, and pre-trained model are publicly available at https://github.com/Zhihan1996/DNABERT_S.


Using AI to see how well past extinctions can predict future biodiversity loss

#artificialintelligence

Evidence from past extinctions cannot be used as a definitive way of predicting future biodiversity loss, scientists have found by using AI. A team of researchers including Dr. James Witts of the University of Bristol's School of Earth Sciences and led by Dr. William Foster from Hamburg University used fossils from previous mass extinctions to see if AI-generated models can accurately predict extinction vulnerability. Despite expectations, this research found that mass extinctions could not be used to generate predictive models for other biodiversity crises, with no common cause flagged. This is because marine communities are constantly evolving and no two mass extinctions are impacting the same marine ecosystem. Co-author Dr. Witts explained, "In a time of increasing extinction risk, knowing whether we can make predictions about the vulnerabilities of different organisms to extinction is essential."


Bird Buddy smart feeder uses AI to identify over 1,000 feathered friends in your backyard

#artificialintelligence

CEO and co-founder Franci Zidar says the bird feeder's module design will allow for continuous hardware and software upgrades. Bird Buddy, the creator of a smart bird feeder that takes pictures of feathered friends visiting your yard, has announced a new gadget that can accurately identify hundreds of different species of hummingbirds, even while they are in flight. Franci Zidar, the CEO and co-founder of Bird Buddy, told FOX Business at the Consumer Electronics Show 2023 (CES) that he and his friend came up with the idea for the smart feeder in a way familiar to many young, eager entrepreneurs – over several late-night conversations and a couple of beers. Bird Buddy boasts a modular design, with a detachable white center that can be removed and inserted into other housings to activate additional features. "We're launching a hummingbird feeder that you can swap out, take your existing module, put it in there, and unlock a new species of birds," Zidar said. The new AI-integrated Smart Hummingbird Feeder can take high-quality photos and videos and accurately identify 350 different species of hummingbirds, even those with wing speeds reaching 60 mph.